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Abstract 

Reconstruction of biochemical reaction networks is a central topic in systems biology which raises 
crucial theoretical challenges in system identification. Nonlinear Ordinary Differential Equations (ODEs) 

£N) that involve polynomial and rational functions are typically used to model biochemical reaction networks. 

Such nonlinear models make the problem of determining the connectivity of biochemical networks from 
time-series experimental data quite difficult. In this paper, we present a network reconstruction algorithm 
that can deal with model descriptions under the form of polynomial and rational functions. Rather than 
identifying the parameters of linear or nonlinear ODEs characterised by pre-defined equation structures, 
our methodology allows us to determine the nonlinear ODEs structure together with their associated 
reaction constants. To solve the network reconstruction problem, we cast it as a Compressive Sensing 

, , (CS) problem and use Bayesian Sparse Learning (BSL) algorithms as an efficient way to obtain its 

^ solution. 

q I. INTRODUCTION 

A long standing problem in systems biology is to reconstruct biochemical reaction networks. 
Reconstruction means to identify both the topology and the parameter values of some bio- 
chemical reaction networks. More specifically, network reconstruction tries to recover the set 
of nonlinear ODEs associated with the biochemical processes from time-series experimental 
data. A naive reconstruction method consists in searching among all possible reactions the few 
that seem consistent with the time series. The associated computational burden of such a naive 
approach is typically horrendous even for network of modest dimensions. Within the systems and 
control community, identification of biochemical reaction networks, genetic regulatory networks 
in particular, are quite active research areas [1], [17], [18], [20], [22], [29]. 

Many linear and nonlinear functions can be used to describe the dynamics of biochemical 
reaction networks in terms of biochemical kinetic laws, e.g., as first-order functions /([S]) = 
[S], mass action functions f([Si],[S 2 ]) = [Si] ■ [S 2 ] , Michaelis-Menten functions f([S]) = 
Knax [S] /(K M + [S]), Hill functions f([S\) = V max [S] n /(K M + [S] n ). Furthermore, it is not 
uncommon that a single gene is regulated by more than one transcription factor. In such situations, 
the combined effect of these regulators on gene expression needs to be described by a multi- 
dimensional input function. Such input functions typically take the form of ratio of polynomials 
involving the concentrations of the input transcription factors i = l,...,n, for example, 
f(x u ■■■,x n )=J2i Pi{xi/ K i) n */ (1 + J2i Pi{xi/Ki) mi ), where K { is the activation or repression 
coefficient for the transcription factor Xi, fa is its maximal contribution to gene expression, and 
the Hill coefficients are n» = m 8 > for activators and rii = 0, rrii > for repressors. These 
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types of functions have been shown to appropriately describe experimentally determined input 
functions [21]. More complicated expressions are also possible, e.g., if the different transcription 
factors interact with each other at the protein level [4]. 

Our main objective is, given experimental time-series data, to identify both the interconnection 
topology (the form of the nonlinear functions) and their associated parameters. Without prior 
knowledge on model structure, this is still an open problem in system identification. 

During his plenary talk at the 50th IEEE CDC, Prof. Lenart Ljung emphasised on four 
opportunities for further research and development in system identification. Two of these were: (a) 
the use of sparse datasets and (b) the use of machine learning approaches [15]. The approach that 
we consider in this paper is aligned with these two recommendations. Specifically, our approach 
draws inspiration from the fields of signal processing and machine learning, by combining 
compressive sensing and Bayesian sparse learning to offer an efficient method for network 
reconstruction. 

The paper is organised as follows. In Section we introduce the type of biochemical reaction 
networks model we consider in this paper, i.e., discrete Langevin ODEs. In Section |In| we 
formulate the network reconstruction problem associated with the model class proposed in 
Section [TTJ. In Section [IV] we show how the reconstruction problem can be converted into a 
compressive sensing problem. In Section |V} we show how Bayesian sparse learning algorithms 
can be used to solve the compressive sensing problem. In Section [VT| we apply our method to 
the reconstruction of the repressilator and show how it can be reconstructed almost exactly using 



the compressive sensing framework. Finally, in Section VII we conclude and discuss several 
future problems that we plan to address. 

II. MODEL FORMULATION 
We consider dynamical systems of the following general form: 

!=«(,), a) 

where x G W l , S G M nxL , and /(•) G R L is a vector of L unknown functions. This set of 
functions can, for example, include mass action kinetic terms under the form of product of 
monomials, monotonically increasing or monotonically decreasing Hill functions, simple linear 
terms, etc. Let neither the value of the entries nor the structure of matrix S be known. Given 
experimental data in terms of time series of state x, our objective is to identify the values in S. 
Applying standard Euler discretisation to ([T]), we obtain: 

x{tk+i) = x(t k ) + (ifc+i - h)Sf(x(t k )), (2) 

Assuming that the discretisation step t k+ i — t k = e is constant for all k, and defining the error 
between two successive state values as e(t k+ i) = x(t k+ i) — x(t k ), we can write ^ under the 
general form 

e(t fc+ i) = v T /(x(t fc )), (3) 

where e = [e 1; e 2 , . . . , e n ] T G IR nxl denotes the error vector; x = [xi, x 2 , ■ ■ ■ , x n ] T G IR nxl de- 

v n ... v L1 

eS G IR nxL ; and f(x) = [A(x), f 2 (x), . . . , f L (x)} T G 



notes the state vector; v T 
R Lxl is the vector field. 



_v ln ... v Ln . 
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Typically biochemical reaction networks exhibit stochastic behaviours due to intrinsic and 
extrinsic noise at the gene expression level. Furthermore, the acquisition of data is typically 
affected by the inaccuracy of the measurement devices as well as by unpredictable environmental 
perturbations. It is therefore more realistic to consider Langevin nonlinear dynamics: 

e(t k+1 )=v T f(x(t k ))+Z(tk). (4) 

£(tfc) € M n represents energy-bounded process noises which are assumed to be independent, to 
be distributed according to normal probability distributions: 

E[£{t k )] = 0, 

E^fM = QkSkj, 

where 

In the following section, we shall propose an algorithm that uses time-series data to reconstruct 
the eq. <Q. 

III. RECONSTRUCTION PROBLEM FORMULATION 

A. Problem Formulation 

Consider the discrete-time model in eq. Q: 

e T (t k+1 ) = f T (x(t k ))v + f(t k ) } (6) 
Assuming that M successive data samples are obtained from ([6]) and defining 

r 



yi • • • 


Yn] = [ e (*l 


)... 


e(t M 




... e n (ti) 






ei(h) 


■ ■ ■ e n (t 2 ) 




^Mxn 






G I 




ei(t M ) 


■ ■ ■ Cn(^Af) . 
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Eq. (|6]) can be written 



0v + S, 



or 



yi = 6vi + & 



,n) 



(7) 



(8) 



We want to find given measured data stored in y,. This a typical linear regression problem 
that can be solved using standard least square approaches, provided that the structure of the 
nonlinearities in the model are known, i.e., provided that is known. However, in most cases, 
these nonlinearities are unknown or difficult to assume a priori, and as a consequence, itself 
is unknown and therefore Vj cannot be solved from eq. ([8]). 

Depending on the field for which the dynamical model needs to be built, only a few typical 
nonlinearities specific to this field need to be considered. For example, the class of models that 
arise from biochemical reaction networks typically involve nonlinearities that capture fundamen- 
tal biochemical kinetic laws, e.g., first-order degradation functions, mass-action kinetics, Hill 
and Michaelis-Menten functions. In what follows we gather in a matrix $ similar to the set 
of all candidate basis functions that we want to consider for reconstruction: 

F l (x(t )) ... F N (x(t )) 
Fi(z(*i)) ... F w (x(t!)) 



_F l (x(t M _ 1 )) ... F N (x(t M -i)) 
This leads to equations similar to ([8]): 

y< = $Wi + &, (i = l,...,n). 

where 



>MxN 



(9) 



w 



[ Wi ... w n ] 

Wn . . . LOin 



(10) 



(11) 



) JVxn 



_ Co>ati . . . UJ jy n _ 

We will introduce in section [IV] a method that allows us to reconstruct Wj from time-series 
observations of x and yj. Before explaining this method, we introduce in the following two sec- 
tions the most commonly used nonlinear functions that appear in nonlinear ODEs used to model 
biochemical reaction networks and explain how, based on these functions, the reconstruction 
problem amounts to infer $. 



B. Polynomial Input Functions 

A Monomial m a in n variables is a function defined for 
a E Z™. The degree of a monomial is defined as degm a := Ym=i a i- Polynomials being 
decomposable into sums of monomial terms, the elements Fj(x) appearing in the basis function 
matrix $ can be represented as monomials of the form: 

Fj(x) =< 1 X2 2 ...<", (12) 

where x^ (k = 1, . . . , n) is the kth component of the dynamical variable x. As an illustration, 
we consider the case of n = 2 variables xi, x 2 . In that case, we have the following basis 
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functions to represent polynomials: Fi(x) = x^x^ = 1, F 2 (x) = x\x 2 = x i> F 3 (x) = x\x\ = 
x 2 , • • • , F N (x) = x s ^x b 2 2 , where the exponents s x and s 2 are the maximal possible stoichiometric 
coefficients for X\ and x 2 in biochemical reactions [19]. 

Given an upper bound on the degree of the polynomials that we want to consider as potential 
candidates in the reconstruction process, we thus need to consider as elements Fj(x) of $ all 
possible independent monomials of degree less or equal to the maximal degree of the candidate 
polynomial. 



C. Rational Input Functions 

Another common type of nonlinearities in dynamical models of biochemical reaction net- 
works are the activation and repression Hill and Michaelis-Menten functions. To include these 
nonlinearities as potential candidates in the reconstruction process, we consider the general 
multi-dimensional input function introduced in Section [j] 



GAx) 



1 • Er^-OM', 



where and Kj are parameters to be identified and the exponents rij and rrij are assumed to 
be fixed. The cases of gene activation or repression are encompassed by this general form: gene 
activation corresponds to the situation where rij = rrij > whereas gene repression corresponds 
to the case where rij = 0. 

To give an idea on how to deal with rational rate terms, we assume a general model of the 
form as described in ([3]): 

J2j '>/.,(•'',./ (//.•).• Fjj)" • 



l + E;/%(^)/^) mi - 
By multiplying both sides of equation ( |T3"J ) by 1 + J2j Pij( x ij(tk) I Kij) mij , we obtain 

1 + Y) PijMt^/Kij)™*'} ei (t k+1 ) 

= -t< [1 + J2j PiiMtkyKij)™"] Xi(t k ) 

+ ^2.a lj {x j {t k )/K lj pi 
1) Activation Hill Functions: = vdif. We let 



(13) 



a; 



We then obtain: 



1 + y~] PijX? J (tk) ei(t k 
* — '3 J 
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This leads to 



= %Xi(t k ) + a tJ i" y (£*) + {t k )xi{t k ) 

- hjxT {h)e-i{h+x) + Ci, 
^ — 

= %Xi(t k ) + o# a:? + {t k )xi{t k ) 

- V" fyx™^ (t k )(xi(t k+1 ) - Xi{t k )) + Ci 
^ — 'j 

= liXi{t k ) + a^x^' + ^2 hjx]' 3 {h)xi{t k ) 

- 22 (^)^(tfc+i) + 22. (3ijtf lJ (t k )xi(t k ) + Ci, 

from which o;^-, /3jj, Cj can be estimated. We can then calculate 74, and a, 
2) Repression Hill Functions: = 0: We let 



a 



1.) 



Ciftij, "iiftiii y ~j j &ij Cj 



We then obtain: 



1 + J2 h*? 3 ^)} et(t k+1 ) 
* — '3 J 

[1 + ^2 . /vr (**)] + 5Z • c A x r (**) + ^ 



This leads to: 



7* 



ei(tfc+i) 

T^fe) + y^ 

- PijX™ l ' J ' (t k )(Xi(t k+1 ) - Xi(t k )) + Ci 
z — 

liXi{t k ) + ^ aijxJ tJ (t k ) + ^J. hjxj 10 (t k )xi(t k ) 



ih) + y^ Ptjx^ 13 {t k )xi{t k ) 
i. — ^ 



'.7 



from which 7^, a^-, /8y, $%j,Ci can be estimated. Based on these, we can then calculate 7^,/% 
and 

We thus see that the problem of identification of rational polynomial functions with unknown 
parameters in their denominator can be reduced to a problem of polynomial identification. This 
can be done following the approach presented in Section [Tll-B| by extending the basis functions. 

To consider various Hill coefficients rrij, e.g., all possible Hill coefficient values from the set 
{1, 2, 3, 4}, we just need to consider all the functions Gj(x) with these individual values of rrij 
and to add the corresponding monomial terms extracted from the above derivations as elements 
of the sensing matrix $. 



15/2/2012 



DRAFT 



7 



D. Problem Transformation 

Once the matrix $ is constructed so as to contain all the candidate basis functions that we 
want to consider in the reconstruction, the remaining task is to propose an efficient method that 



allows to solve for Wj in eq. (10): 



y. = $ Wi + £. ; (i = l,...,n) 



Typically the weighting vector Wj solution of p0| ) is k-sparse. Mathematically, we say that a 
signal w is k-sparse when it has at most k non-zero entries, i.e., ||w|| < k. We let 

Q k = { w : || w || < k} 

denote the set of all £>sparse signals. On one hand, biochemical reaction networks are typically 
sparse [13]. On the other hand, the acquisition of sufficient biological time series data over 
long time spans is quite difficult due to the typical cost of wet-lab experiments and current 
technological limitations in terms of the type and quality of the measurements. Furthermore, 
since the nonlinear form of the equation is typically unknown, there can potentially be a very 
large number of candidate functions. As a consequence, we typically have N ^> M for &mxn 
and Wj sparse. 



The linear regression problem ( [TO] ) can thus be defined as a compressive sensing, or sparse 
signal recovery problem [6], [10], with observation vector y is known regressor matrix $, un- 
known coefficients w», and additive noise S. In sparse problems, the prior belief is that only a 
small fraction of the elements appearing in Wj are non-negligible. The general aim is to identify 
the smallest subset of columns of $, whose linear span contains the observations y$. Identifying 
this smallest subset for all % — 1, . . . , n corresponds to looking for the sparsest w that satisfies 
the following equation: 

y = $ w + E. (14) 



The solution w* to eq. ( [T4| ) is, by definition, going to be sparse. Now, if all the elements in 
the i th row of w* are 0, then we can delete the i th row of w* and the i th column of <3>. After 



repeating this for all the rows of w*, we can equivalently rewrite (14) as 



$*w* + S*. (15) 



The solution of eq. ( [T5| ) is an estimate of the solution of the original problem defined in eq. f7]) 



which itself is related to the discretisation of eq. ([[]). 



IV. COMPRESSIVE SENSING 

A. Algorithm for Compressive Sensing 

To be consistent with eq. ( fit)] ), in this and next section, i = 1, . . . ,n. Since w, : is sparse, we use 
regularisation methods to recover Wj. Regularisation methods impose upper bound constraints 
on the l norm of Wj, ||wj|| , i.e., on the number of nonzero terms in Wj. A sparse approximation 
to Wj is readily obtained by solving the following optimisation problem: 

Wi =argmin{||yj - $Wi\\l + Pi ||wj|| }. (16) 

Wj 

Unfortunately, this optimisation problem is both numerically unstable and NP-complete; therefore 
some simplifications are typically used to recast this problem into another one for which efficient 
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algorithmic solutions exist. The most common simplification is to use the /x-norm instead of the 
/ -norm [6], so that the optimisation problem becomes 



w ; =argmm{||y.j 



|2 . 

I 2 + Pi 



w,- 



(17) 



The approach presented in eq. ( [17] ) is known as Lasso [23]. A number of methods have been 
proposed to solve Compressive Sensing (CS) reconstruction problems, including l\ -minimisations 
[6], [7] and greedy algorithms [8], [11], [27]. Both methods can actually be shown to have 
performance guarantees for the exact recovery of w, if $ satisfies the so called restricted isometry 



property [6]. We will discuss this in more detail in Section VII 



Since $ as defined in ([9]) could be rank deficient, ( [17] ) cannot, in general, be readily solved 
by these two algorithms and thus a different approach needs to be considered. Thanks to recent 
results in signal processing and machine learning, we propose hereafter a method to solve the 



compressive sensing problem defined in (17) using a Bayesian formulation [14], [24]-[26]. 



B. Bayesian Compressive Sensing 

1 ) Linear Regression and Maximum Likelihood: Bayesian modelling treats all unknowns as 
stochastic variables with certain probability distributions [3]. Using a Bayesian framework, we 
assume that the stochastic variable & are independent and characterised by a Gaussian distribution 
with zero-mean and variance of. We further define the precision or inverse- variance as = 1/af . 

The multivariate Gaussian likelihood of y, is then given by: 



p(y*|wi, $ 



A/Xy^w^- 1 
(2na* 



M 

2 exp 



1 

2a 



2 ny< 



$w„- 



(18) 



Obtaining maximum likelihood estimates for Wj under these conditions is equivalent to searching 
for a minimal / 2 - n orm solution to eq. ([7]). 

2) Sparseness Prior: The sparseness of Wj can be imposed by using a Laplace prior of the 
form 



p(w<|A) 



exp(-Ai ||w 



(19) 



This then becomes equivalent to the /i-regularisation formulation in eq. ( 17 ). Using the likelihood 



function in eq. ( [18] ) and the prior in eq. ( [19] ), the solution of eq. ( [17] ) can be shown to correspond 
to a maximum a posteriori (MAP) estimate for Wj. 

However, obtaining an estimate of the full posteriors on w< and fa cannot be accomplished 
using the Laplace prior directly since Laplace priors are not conjugate to the conditional Gaussian 



distribution in eq. ( fl8| ). In the following section, we show that the use of hierarchical priors 
alleviates this. The CS problem can thus be converted into a linear-regression problem with a 
prior which is sparse. Given the sensing matrix $, we show in the next section how the sparse 
weights Wj and the inverse of noise variance can be estimated using in a sparse Bayesian 
learning approach. 
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V. RECONSTRUCTION VIA BAYESIAN COMPRESSIVE SENSING 

A. Specification of Hierarchical Priors 

Instead of imposing a Laplace prior on Wj, as defined in the last section, a sparse Bayesian 
learning approach is adopted [25]. Namely, we use hierarchical priors over the distribution of 
Wj. This allows for a convenient conjugate-exponential analysis with similar properties as for 
the Laplace priors. The main advantage of such an approach is that it allows us to impose a 



Gaussian prior with zero-mean on each element of Wj (see (fTTj)), i.e., 

A? 



p(wi\ai) = n^K-IO,^. 1 ). (20) 



In ( [20] ), a.i = (an, . . . , aw) G R lxN represent a vector of N independent hyperparameters, with 
controlling the precision (or the inverse of the variance) of the prior imposed on Wj. It is 
this form of prior that is eventually responsible for the sparsity properties of the model (see [25] 
for more details). It is common to place a Gamma prior on as follows: 

N 

p(oLi\a,b) = JJr(a ii |a,6). (21) 

where the Gamma distribution is defined as: 

m\a,b) = ^- ) e~ 1 exp[-b^] 

where T(a) = J °° t a ~ 1 e~ t dt is called the 'Gamma function', £ > denotes a hyperparameter, 
and a > is the shape parameter, b > is a scaling parameter. The T distribution is generally 
chosen as the prior for the precision of a Gaussian distribution because (a) it corresponds to 
its conjugate prior, thereby greatly simplifying the analysis and (b) it also includes the uniform 
distribution as a limiting case. The overall prior on Wj is then evaluated as 

N roc 

p(wi\a, b) = TT / Af(uJi j \0,a^ j 1 )T(ai j \a,b)da ij . (22) 

3=1 J ° 

The density function T(aij\a,b) is the conjugate prior for a^ when plays the role of 
observed data and _A/"(u>y|0, a^ 1 ) is the associated likelihood function. Based on this, the integral 
/ °° Af(uiij\0, a^j )T(aij\a, b)daij can be evaluated analytically. It can be shown that this integral 
corresponds to the Student's ^-distribution which is strongly peaked around Uij = 0. Conse- 



quently, the prior in (22) is a sparseness prior for Wj. Similarly, a Gamma prior is introduced 

on ^ 

p(f3 i \c,d) = T(f3 i \c } d). (23) 



B. Bayesian Inference via Relevance Vector Machine 

Given and $ and assuming that the hyperparameters CKj and fa are known, the posterior 
distribution for w< conditioned on the data is given by combining the likelihood and prior with 
Bayes' rule: 



p(wi\yi, aci, fa) = 





W h fa)p(Wi 


OLi) 


p(yi 


ecu fa) 



(24) 
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This posterior distribution of w* is Gaussian and the associated mean and covariance matrices 
are given as 

m« = ft£* r y, (25) 
= (A l + /3^ T ^)- 1 (26) 

where A; = diag(ay). In the context of Relevance Vector Machines (RVM), the associated 
"learning" problem becomes the search for the hyperparameters cxi and fa. RVM is a Bayesian 
sparse kernel technique for regression and classification [25]. It shares a lot of characteristics 
with support vector machines [3]. In RVM, these hyperparameters are estimated from the data 
by maximising 

p(y j |a„ft) = /p(y>„ftWw i |a i ) (i w ( . 

This quantity is known as the marginal likelihood. The maximisation is known as a type-2 
maximum likelihood [2] or evidence approximation maximisation [16]. 

To avoid the convolution of two Gaussians, one can use log marginal likelihood £(ai,ft). 
This leads to 

C(tXi,Pi) = log p{yi\oci, ^) 

= log / p(yi\wi, Pi)v{^i\oti)dWi 



= logJVfolO.Ci) 

= -i[Mlo g 27r + log|C i |+yfCr 1 y f ] 

(27) 

where the M x M matrix Q = of I + QA^ 1 ^. A type-2 maximum likelihood approximation 
employs the point estimates for en,; and (3j to maximise C(ati,/3j). By setting the required 
derivatives of the marginal likelihood to zero, we can obtain the following re-estimate equations 
[16], [25] 

a-™ = %(j = l,2,...,iV), (28) 

m ij 



where is the ith posterior mean weight appearing in (25) and 7^ = 1 — ajjS^jj), with 
Sj(jj) representing the jth diagonal element of the posterior weight covariance defined in (26). 
Estimation of 7^ and can be efficiently done via the expectation-maximisation algorithm, 
an iterative procedure for maximum likelihood parameter estimation from data sets with missing 
or hidden variables [9]. 

For a noise variance of = 1/Pu the re-estimate equation is 

car)- 1 = ^rt m ^ - W 

j 



The quantity 7^ in ( [29] ) measures how well the corresponding parameter ojij appearing in ( [T0| ) 
is determined by the data. Note that a^ w , j = 1, ... ,7V and (3™ ew are functions of and 



Sj in equations (|28])-([29|). Furthermore, nij and Sj are a function of and fti in equations 



(25)-(26). This suggests an iterative algorithm, which iterates between equations (25)-(26) and 



15/2/2012 



DRAFT 



11 



equations { 28 )-(]29[) until a convergence criterion has been satisfied. During re-estimation, many 



of the «jj tend to infinity when the corresponding Wj,- are very small or zero. Usually, only a 



few aij are small. From ( [24] ), this implies that p(ufy-|yi, on, fa) becomes highly (in principle, 
infinitely) peaked at zero. This implies that we can be a posteriori 'certain' that Uij are zero. The 
corresponding basis functions in $ can thus be 'pruned', and sparsity is realised [25]. In [24] 
and [26], a fast RVM algorithm was developed by analysing the properties of the log marginal 



likelihood function in eq. (27). This enables a principled and efficient sequential addition and 
deletion of candidate basis function (columns of <&) to monotonically maximise the log marginal 
likelihood. Details about this fast algorithm can be found in [24] and [26]. 

VI. IDENTIFICATION OF THE REPRESSILATOR 

We consider here a classical dynamical system in systems/synthetic biology which we will 
use to illustrate the biochemical network reconstruction problem at hand. The repressilator is a 
synthetic oscillator network that was originally conceived and constructed by Elowitz and Leibler 
[12]. The network consists of three genes repressing each other in a ring structure. 

A mathematical description the repressilator that includes both transcription and translation 
dynamics can be described as 

dx 2 a 2 , „ 

dx 3 o 3 
-jr = -73X3 + — — ^ + 3 , 
dt 1 + £5 



dxi 

~dT 

dx 5 

~dT 
dx 6 

~dT 

Here, x±,X2,x 3 denote the concentrations of the mRNA transcripts of genes 1, 2, and 3, respec- 
tively whereas 24,0:5, x 6 denote the protein concentrations of the respective genes. 0:1,0:2,03 
denote the maximum promoter strength for their corresponding gene, 71, 72, 73 denote the mRNA 
decay rate, 74, 75, 76 denote the protein decay rate, fa, fa, fa denote the protein production rate, 



-74X4 + faxi, 
"75^5 + fax 2 , 

-7 6 x 6 + fax 3 , (30) 



61,62, 6 3 denote the basal transcription rate. The set of ODEs in (30) corresponds to a topology 
where gene 1 is repressed by gene 2, gene 2 is repressed by gene 3, and gene 3 is repressed 
by gene 1. Using the standard forward Euler method to numerically solve ODEs, we obtain 
trajectory of the six states X\, . . . ,x e in Fig. [T] These trajectories are then sampled to generate 
a time-series of gene expression data. 

Take gene 1 for example. The hill coefficient n\ will typically have a value within a range 
from 1 to 4 due to biochemical constraints. The core question here is: how can we determine 



the topology and kinetic parameters of the set of ODEs in (30)? Note that we do not assume 
a priori knowledge of the form of the nonlinear functions, e.g., whether the degradation obeys 
first-order or enzymatic catalysed dynamics or whether the proteins are repressors or activators. 
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Next we show how the network construction problem of the repressilator model in ( pO] ) can 
be formulated under the form presented in pO] ). 

Following the procedure in ([2]), Q and (|6]), we construct a matrix of candidate functions 
$ by selecting the most commonly used candidate basis functions used to model biochemical 



reaction networks (see Sections III-B and III-C). As a proof of concept, we only consider Hill 
functions as potential nonlinear candidates. The set of Hill functions with Hill coefficient i, both 
in activating and repressing from, for each of the 6 state variables reads: 



hffl»(tfe) 



1 



1 



l + x\(t k ) 
x\(t k ) 



4(*fc) 



i + x\{t k y"' 1 + 4(4) 



1x12 



where i represents the Hill coefficient. In what follows we consider that the Hill coefficient 
can take any of the following values: 1, 2, 3 or 4. To also take into account basal transcription 
expression rates, we add to the last column of $ a unit vector. Since there are 6 state variables, 
we can construct the basis matrix $ appearing in pT| ) with 6 (basis functions for linear terms) 
+(4* 12) = 48 (basis functions for hill functions) +1 (basis function for basal expression) = 55 
columns. 



Considering (10) with the basis function matrix $ given in (31), the corresponding target 
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weight matrix w should be: 
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As an illustration, we choose the parameter values for w as: 
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The algorithm is implemented in MATLAB R2009a. The calculation is run on a standard laptop 
computer (Intel Core2 Duo P8600 2.4GHz with 4GB RAM). We run the algorithm for the 
following two experiments independently with 100 rounds respectively. The sparsest and closest 
solutions were selected. The computation time for each round is less than 0.1 second. 

1) Let tr = 50, sampling interval t k +i — t k = 0.1, and T = 500. The variance of noise is 
fixed at 10~ 3 . The estimated w is 
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2) Let t T = 5, sampling interval t k+ \ — t k = 0.1, and T = 50. The variance of noise is fixed 
at 10~ 3 . The estimated w is 
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where the basal rate could not be estimated in the lesser sample case. 

VII. CONCLUSION AND DISCUSSION 

In this paper, a new network reconstruction method for biochemical reaction networks is 
proposed. This method takes advantage of compressive sensing and sparse Bayesian learning. 
The proposed method only requires time series data and does not assume prior knowledge about 
the model structure (topology and nonlinear functional forms) and parameters. The problem is 
posed in such a way that candidate nonlinear functions specific to the type of models used 
(here biochemical reaction networks) are sought after. Since biochemical reaction networks are 
typically sparse, the key idea is to adopt a formulation which allows to transform the nonlinear 
identification problem into a compressive sensing problem and to solve it efficiently using a sparse 
Bayesian learning approach. We have illustrated how this approach can be used to efficiently 
reconstruct the nonlinear ODEs of a repressilator based on time series data. 

However, beyond the results presented here, some issues remain to be further discussed and 
are part of ongoing work in our group. 

A. Performance Guarantee 

A sufficient condition for exact reconstruction with either /i-minimisation or greedy algorithms 
is the so called restricted isometry property (RIP): A matrix $ e ]R MxAr is said to satisfy the 
RIP with coefficients (K, 5) for K < M, < 5 < 1, if for all index sets I C {1, . . . , N} such 
that |/| < K and for all q e one has 

(1-5) \\q\\l< \\$ iq \\ 2 2 <(l + S) \\q\\l 

where $j denotes the matrix formed by the columns of $ with indices in /. It was shown in 
[5], [6], [8] that both /i-minimisations and greedy algorithms lead to exact reconstruction of 
fT-sparse signals if the matrix $ satisfies the RIP with a constant parameter < 5 < 1. 

However, the Bayesian method proposed in this paper has no known performance guarantee 
in terms of exact recovery equivalent to the RIP condition. Establishment of such performance 
guarantees in the Bayesian compressive sensing framework is part of ongoing work in our group. 

B. Hidden Nodes 

We have so far assumed that the system is fully observable. However, in reality, measurement 
observations will typically be partial [28] (in particular, the number of hidden/unobservable nodes 
and their position in the network is usually unknown). How to generalise our framework to the 
case of hidden nodes and partial observations is another topic of further study in our group. 
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